Probabilistic Language Modeling for Generalized Lr Parsing
نویسندگان
چکیده
Preface In this thesis, we introduce probabilistic models to rank the likelihood of resultant parses within the GLR parsing framework. Probabilistic models can also bring about the beneet of reduction of search space, if the models allow preex probabilities for partial parses. In devising the models, we carefully observe the nature of GLR parsing, one of the most eecient parsing algorithms in existence, and formalize two probabilistic models with the appropriate use of the parsing context. The context in GLR parsing is provided by the constraints aaorded by context-free grammars in generating an LR table (global context), and the constraints of adjoining pre-terminal symbols (local n-gram context). In this research, rstly, we conduct both model analyses and quantitative evaluation on the ATR Japanese corpus to evaluate the performance of the probabilistic models. Ambiguities arising from multiple word segmentation and part-of-speech candidates in parsing a non-segmenting language, are taken into consideration. We demonstrate the eeectivity of combining contextual information to determine word sequences in the word segmentation process, deene parts-of-speech for words in the part-of-speech tagging process , and choose between possible constituent structures, in single-pass morphosyntactic parsing. Secondly, we apply empirical evaluation to show that the performance of the probabilistic GLR parsing model (PGLR) using an LALR table is in no way inferior to that of using a CLR table, despite the states in a CLR table providing more precise context than those in an LALR table. Thirdly, we propose a new node-driven parse pruning algorithm based on the preex probability of PGLR, which is eeective in beam search style parsing. The pruning threshold is estimated by the number of state nodes up to the current parsing stage. The algorithm provides signiicant evidence of reduction in both parsing time and computational resources. Finally, a further PGLR model is formalized which overcomes some problematic issues by the way of increasing the context in parsing. i Acknowledgments First and foremost, I would like to thank my supervisor, Prof. Hozumi Tanaka for his guidance, support and encouragement throughout the years of my Ph.D. studentship. I would also like to thank Assoc. Prof. Takenobu Tokunaga for his wealth of valuable comments to this research, and other members of my thesis committee: Prof. Additionally, I would like to express my gratitude to members of the Tanaka & Toku-naga laboratories for their contributions to this research, amongst whom, I can never thank Kentaro Inui, Kiyoaki Shirai …
منابع مشابه
A New Formalization of Probabilistic GLR Parsing
This paper presents a new formalization of probabilistic GLR language modeling for statistical parsing. Our model inherits its essential features from Briscoe and Carroll's generalized probabilistic LR model [3], which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. Briscoe and Carroll's model, however, has a drawback in ...
متن کاملA New Probabilistic LR Language Model for Statistical Parsing
This paper presents a newly formalized probabilistic LR language model. Our model inherits its essential features from Briscoe and Carroll's generalized probabilistic LR (PLR) model [3], which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. However, our model is simpler while maintaining a higher degree of context-sensiti...
متن کاملGeneralized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...
متن کاملString Shuffling over a Gap between Parsing and Plan Recognition
We propose a new probabilistic plan recognition algorithm YR based on an extension of Tomita’s Generalized LR (GLR) parser for grammars enriched with the shuffle operator. YR significantly outperforms previous approaches based on topdown parsers, shows more consistent run times among similar libraries, and degrades more gracefully as plan library complexity increases. YR also lifts the restrict...
متن کاملA structured statistical language model conditioned by arbitrarily abstracted grammatical categories based on GLR parsing
This paper presents a new statistical language model for speech recognition, based on Generalized LR parsing. The proposed model, the Abstracted Probabilistic GLR (APGLR) model, is an extension of the existing structured language model known as the Probabilistic GLR (PGLR) model. It can predict next words from arbitrarily abstracted categories. The APGLR model is also a generalization of the or...
متن کامل